5.1 - Goal - Learning a Combat Skill - Kiting
This chapter marks a transition from macroeconomic tasks to micro-management. Our third agent will be trained to master a fundamental combat tactic known as kiting or "stutter-stepping."
The goal is to move beyond resource management and teach an agent to make optimal, split-second decisions in a dynamic, adversarial combat scenario.
The Desired Behavior- A State-Based Kiting Policy
Kiting is a policy for a ranged unit to defeat a faster melee attacker. The agent must learn to switch between "Attack" and "Move" states based on its weapon cooldown and its distance to the target.
The Ideal Policy (Finite State Machine):
+------------------+ Target is in range +-------------------+
| | -------------------------> | |
| State: | | State: |
| MOVING TOWARDS | <------------------------- | ATTACKING |
| TARGET | Target is out of range | |
| | | |
+--------+---------+ +---------+---------+
| |
| Target is in range | Weapon is on cooldown
| |
+---------------------v---------------------------+
|
+-----------v-----------+
| |
| State: |
| KITING (MOVING AWAY) |
| |
+-----------^-----------+
|
| Weapon cooldown is over
|
+
Success Criteria
- The agent (a single Marine) must learn to consistently defeat a single, faster Zergling.
- The agent's policy must correctly utilize
weapon_cooldown
to decide when to move. - The agent's policy must correctly utilize
target_in_range
to decide when to attack or reposition.
Controlled Scenario Setup
To isolate the kiting task, we will use debug commands to create a perfect, repeatable 1v1 scenario at the start of each episode.
- Spawn one friendly Marine for the agent to control.
- Spawn one enemy Zergling at a fixed distance.
- Immediately remove all starting worker units.
Environment Design
1. Observation Space Specification
The observation vector is designed to provide all critical information for a 1v1 combat engagement.
- Gymnasium Type:
gymnasium.spaces.Box
- Shape:
(5,)
Index | Feature | Rationale |
---|---|---|
0 | marine.health_percentage | "How much danger am I in?" |
1 | marine.weapon_cooldown > 0 | (Key Signal) "Is my weapon ready to fire?" |
2 | zergling.health_percentage | "How close am I to winning?" |
3 | marine.distance_to(zergling) | "Is my positioning optimal?" |
4 | marine.target_in_range | (Key Signal) "Is attacking a valid move from this position?" |
2. Action Space Specification
The agent is given a set of discrete combat maneuvers.
- Gymnasium Type:
gymnasium.spaces.Discrete
- Size:
3
Action | Agent's Intent |
---|---|
0 | Attack Target: Engage the enemy. |
1 | Move Away: Increase distance (the "kite" action). |
2 | Move Towards: Decrease distance (the "chase" action). |
3. Reward Function Specification
The reward must directly incentivize combat efficiency.
Design Philosophy:
- Health Differential: The primary learning signal is the change in relative health between the two units. This directly rewards dealing damage while avoiding taking damage.
- Large Terminal Reward: A definitive win/loss signal at the end of the episode provides a clear, final objective.
Reward Calculation (on each step):
reward = (previous_zergling_hp - current_zergling_hp) - (previous_marine_hp - current_marine_hp)
This creates a zero-sum reward where positive values mean the agent won the trade, and negative values mean it lost the trade. A large terminal reward (+100
for a win, -100
for a loss) is added at the end.